{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7c12609e-3d9a-4ced-a670-4c0a358fac63",
   "metadata": {},
   "source": [
    "# SWIFT在【基础视觉领域】的应用"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a7a064e-3171-44ce-a45a-ba329998807d",
   "metadata": {},
   "source": [
    "#### 使用样例见：https://github.com/modelscope/swift/tree/main/examples/pytorch/cv/notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78a535eb-922f-4a83-9f08-cc4a5ebae377",
   "metadata": {},
   "source": [
    "## 1. 图像分类任务"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79638268-c23f-4188-9811-2c8af15a9a40",
   "metadata": {},
   "source": [
    "### 1.1 安装与导入包\n",
    "- 安装必要的依赖安装包\n",
    "```bash\n",
    "pip install 'ms-swift[aigc]' -U\n",
    "pip install modelscope\n",
    "```\n",
    "- 导入必要的依赖安装包\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5752c15f-e449-44e3-9222-b0e1ad03767d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# basic / third-party\n",
    "import os\n",
    "import tempfile\n",
    "import torch\n",
    "import torchvision\n",
    "import transformers\n",
    "\n",
    "# SWIFT\n",
    "from swift import Swift, SwiftModel, snapshot_download, push_to_hub\n",
    "from swift import AdapterConfig, LoRAConfig, PromptConfig, SideConfig, ResTuningConfig\n",
    "\n",
    "# Modelscope\n",
    "import modelscope\n",
    "from modelscope.pipelines import pipeline\n",
    "from modelscope.models import Model\n",
    "from modelscope.utils.config import Config\n",
    "from modelscope.metainfo import Trainers\n",
    "from modelscope.msdatasets import MsDataset\n",
    "from modelscope.trainers import build_trainer\n",
    "from modelscope.utils.constant import DEFAULT_MODEL_REVISION, Invoke, ModelFile\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc63678e-8b0b-42e3-83d4-4a1c6bdeca70",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 1.2 数据集\n",
    "- [基础模型基准评测集 (FME Benchmark)](https://modelscope.cn/datasets/damo/foundation_model_evaluation_benchmark/dataPeview) 子集 - Oxford Flowers\n",
    "\n",
    "| 序号 |                                   数据集                                  |   描述   | 类别数量 | 训练集数量 | 验证集数量 | 测试集数量 |                                                    样例                                                   |                                             备注                                             |\n",
    "|:----:|:-------------------------------------------------------------------------:|:--------:|:--------:|:----------:|:----------:|:----------:|:---------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------:|\n",
    "|  1  |   [Oxford   Flowers](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)  |   花卉   |    102   |    1020    |    1020    |    6149    |            <img decoding=\"async\"   src=\"resources/images/OxfordFlowers102_image_00001.jpeg\" width=50>           | [预览](https://modelscope.cn/datasets/damo/foundation_model_evaluation_benchmark/dataPeview) |\n",
    "- 加载数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6dad2a2-bd32-45e5-9f34-5df9af1a274e",
   "metadata": {},
   "outputs": [],
   "source": [
    "num_classes = 102\n",
    "CLASSES = ['pink primrose', 'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea', 'english marigold', 'tiger lily', 'moon orchid', 'bird of paradise', 'monkshood', 'globe thistle', 'snapdragon', \"colt's foot\", 'king protea', 'spear thistle', 'yellow iris', 'globe-flower', 'purple coneflower', 'peruvian lily', 'balloon flower', 'giant white arum lily', 'fire lily', 'pincushion flower', 'fritillary', 'red ginger', 'grape hyacinth', 'corn poppy', 'prince of wales feathers', 'stemless gentian', 'artichoke', 'sweet william', 'carnation', 'garden phlox', 'love in the mist', 'mexican aster', 'alpine sea holly', 'ruby-lipped cattleya', 'cape flower', 'great masterwort', 'siam tulip', 'lenten rose', 'barbeton daisy', 'daffodil', 'sword lily', 'poinsettia', 'bolero deep blue', 'wallflower', 'marigold', 'buttercup', 'oxeye daisy', 'common dandelion', 'petunia', 'wild pansy', 'primula', 'sunflower', 'pelargonium', 'bishop of llandaff', 'gaura', 'geranium', 'orange dahlia', 'pink-yellow dahlia?', 'cautleya spicata', 'japanese anemone', 'black-eyed susan', 'silverbush', 'californian poppy', 'osteospermum', 'spring crocus', 'bearded iris', 'windflower', 'tree poppy', 'gazania', 'azalea', 'water lily', 'rose', 'thorn apple', 'morning glory', 'passion flower', 'lotus', 'toad lily', 'anthurium', 'frangipani', 'clematis', 'hibiscus', 'columbine', 'desert-rose', 'tree mallow', 'magnolia', 'cyclamen', 'watercress', 'canna lily', 'hippeastrum', 'bee balm', 'ball moss', 'foxglove', 'bougainvillea', 'camellia', 'mallow', 'mexican petunia', 'bromelia', 'blanket flower', 'trumpet creeper', 'blackberry lily']\n",
    "img_test = \"resources/images/OxfordFlowers102_image_00001.jpeg\"\n",
    "train_dataset = MsDataset.load(\n",
    "    'foundation_model_evaluation_benchmark',\n",
    "    namespace='damo',\n",
    "    subset_name='OxfordFlowers',\n",
    "    split='train')\n",
    "\n",
    "eval_dataset = MsDataset.load(\n",
    "    'foundation_model_evaluation_benchmark',\n",
    "    namespace='damo',\n",
    "    subset_name='OxfordFlowers',\n",
    "    split='eval')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58dda012-8315-4f4f-9170-09a16747b512",
   "metadata": {},
   "source": [
    "### 1.3 一站式训练 [Modelscope + Swift]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5dc3b8d9-59f0-43d8-8f39-04a9b7cbc354",
   "metadata": {},
   "source": [
    "#### Vision Transformers (ViT)\n",
    "\n",
    "<img src=\"resources/images/vit.jpg\" width=\"800\" align=\"middle\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "077e3bdd-c051-41e4-815b-a8b80c17d5db",
   "metadata": {},
   "source": [
    "#### Swift - ViT - Adapter\n",
    "\n",
    "<img src=\"resources/images/adapter.png\" width=\"500\" align=\"middle\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee1a70e7-30c8-43ff-864d-c648fad9187e",
   "metadata": {},
   "source": [
    "#### 1.3.1 使用modelscope加载ViT模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95bbaf3f-6364-43b1-9354-00082cc22572",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_id = 'damo/cv_vitb16_classification_vision-efficient-tuning-base'\n",
    "task = 'vision-efficient-tuning'\n",
    "revision = 'v1.0.2'\n",
    "\n",
    "model_dir = snapshot_download(model_id)\n",
    "cfg_dict = Config.from_file(os.path.join(model_dir, ModelFile.CONFIGURATION))\n",
    "cfg_dict.model.head.num_classes = num_classes\n",
    "cfg_dict.CLASSES = CLASSES\n",
    "model = Model.from_pretrained(model_id, task=task, cfg_dict=cfg_dict, revision=revision)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "465ab9f9-f151-4d0e-bc44-88238686c885",
   "metadata": {},
   "source": [
    "#### 1.3.2 查看模型信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "725f9633-c53a-4b12-bb8c-733bc5987f3a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ab695b3-cfef-4288-8ab8-40b4069ef578",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "module_keys = [key for key, _ in model.named_modules()]\n",
    "print(module_keys)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be0372f6-acf8-4894-8fbd-13aea414be9e",
   "metadata": {},
   "source": [
    "#### 1.3.3 配置SwiftConfig + 模型准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f626c63f-81c6-4197-ac3a-7c668c953242",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# model.backbone.blocks.0.mlp ~ model.backbone.blocks.11.mlp\n",
    "adapter_config = AdapterConfig(\n",
    "    dim=768,\n",
    "    hidden_pos=0,\n",
    "    target_modules=r'.*blocks\\.\\d+\\.mlp$',\n",
    "    adapter_length=10\n",
    ")\n",
    "model = Swift.prepare_model(model, config=adapter_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ec8ace9-6c1d-4d6c-a929-463be4870c4a",
   "metadata": {},
   "source": [
    "#### 1.3.4 查看微调模型信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bcf1b8dd-3e15-49e3-9224-748fb1c778a8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(model)\n",
    "print(model.get_trainable_parameters())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01d97bc2-7319-4892-8294-481a3ea8b29d",
   "metadata": {},
   "source": [
    "#### 1.3.5 训练与评测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "405e6923-7fa1-4303-b48e-744cedacbd7d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def cfg_modify_fn(cfg):\n",
    "    cfg.model.head.num_classes = num_classes\n",
    "    cfg.model.finetune = True\n",
    "    cfg.CLASSES = CLASSES\n",
    "    cfg.train.max_epochs = 5\n",
    "    cfg.train.lr_scheduler.T_max = 10\n",
    "    return cfg\n",
    "\n",
    "work_dir = \"tmp/cv_swift_adapter\"\n",
    "kwargs = dict(\n",
    "    model=model,\n",
    "    cfg_file=os.path.join(model_dir, 'configuration.json'),\n",
    "    work_dir=work_dir,\n",
    "    train_dataset=train_dataset,\n",
    "    eval_dataset=eval_dataset,\n",
    "    cfg_modify_fn=cfg_modify_fn,\n",
    ")\n",
    "\n",
    "trainer = build_trainer(name=Trainers.vision_efficient_tuning, default_args=kwargs)\n",
    "trainer.train()\n",
    "result = trainer.evaluate()\n",
    "print(f'Vision-efficient-tuning-adapter train output: {result}.')\n",
    "print(os.system(\"nvidia-smi\"))\n",
    "torch.cuda.empty_cache()\n",
    "del trainer\n",
    "del model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75ccf7fd-9bad-4cd8-a8da-9b257b4468c9",
   "metadata": {},
   "source": [
    "### 1.4 Parameter-Efficient Tuners"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78394968-ad52-4d7c-9168-a9b9cc850a29",
   "metadata": {},
   "source": [
    "#### Swift - ViT - Prompt\n",
    "\n",
    "<img src=\"resources/images/prompt.png\" width=\"500\" align=\"middle\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94d90328-853c-43ef-af8d-fdad281d9e01",
   "metadata": {},
   "source": [
    "#### 1.4.1 模型准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d87835d3-c679-4224-84a8-e15043f09543",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_id = 'damo/cv_vitb16_classification_vision-efficient-tuning-base'\n",
    "task = 'vision-efficient-tuning'\n",
    "revision = 'v1.0.2'\n",
    "\n",
    "model_dir = snapshot_download(model_id)\n",
    "cfg_dict = Config.from_file(os.path.join(model_dir, ModelFile.CONFIGURATION))\n",
    "cfg_dict.model.head.num_classes = num_classes\n",
    "cfg_dict.CLASSES = CLASSES\n",
    "model = Model.from_pretrained(model_id, task=task, cfg_dict=cfg_dict, revision=revision)\n",
    "\n",
    "prompt_config = PromptConfig(\n",
    "    dim=768,\n",
    "    target_modules=r'.*blocks\\.\\d+$', \n",
    "    embedding_pos=0, \n",
    "    prompt_length=10,\n",
    "    attach_front=False\n",
    ")\n",
    "\n",
    "model = Swift.prepare_model(model, config=prompt_config)\n",
    "\n",
    "print(model.get_trainable_parameters())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33e87b11-0862-41aa-97e4-7f44578d6e0f",
   "metadata": {},
   "source": [
    "#### 1.4.2 模型训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76f027f7-070e-47f1-a607-37eb3d549a39",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def cfg_modify_fn(cfg):\n",
    "    cfg.model.head.num_classes = num_classes\n",
    "    cfg.model.finetune = True\n",
    "    cfg.CLASSES = CLASSES\n",
    "    cfg.train.max_epochs = 5\n",
    "    cfg.train.lr_scheduler.T_max = 10\n",
    "    return cfg\n",
    "\n",
    "work_dir = \"tmp/cv_swift_prompt\"\n",
    "kwargs = dict(\n",
    "    model=model,\n",
    "    cfg_file=os.path.join(model_dir, 'configuration.json'),\n",
    "    work_dir=work_dir,\n",
    "    train_dataset=train_dataset,\n",
    "    eval_dataset=eval_dataset,\n",
    "    cfg_modify_fn=cfg_modify_fn,\n",
    ")\n",
    "\n",
    "trainer = build_trainer(name=Trainers.vision_efficient_tuning, default_args=kwargs)\n",
    "trainer.train()\n",
    "result = trainer.evaluate()\n",
    "print(f'Vision-efficient-tuning-prompt train output: {result}.')\n",
    "print(os.system(\"nvidia-smi\"))\n",
    "torch.cuda.empty_cache()\n",
    "del trainer\n",
    "del model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f02fe752-760c-4f0b-8076-d1ab4021ef52",
   "metadata": {},
   "source": [
    "### 1.5 Memory-Efficient Tuners"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca90fbfc-bbd0-40cd-9803-544136eaa0cb",
   "metadata": {},
   "source": [
    "#### Swift - ViT - Res-Tuning\n",
    "\n",
    "<img src=\"resources/images/restuningbypass.png\" width=\"700\" align=\"middle\" />\n",
    "\n",
    "*Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7bef020-ebbd-4695-8b21-2df6b198c589",
   "metadata": {},
   "source": [
    "#### 1.5.1 模型准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07bd598a-ca36-4e04-a22b-e2f3f8e01af6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_id = 'damo/cv_vitb16_classification_vision-efficient-tuning-base'\n",
    "task = 'vision-efficient-tuning'\n",
    "revision = 'v1.0.2'\n",
    "\n",
    "model_dir = snapshot_download(model_id)\n",
    "cfg_dict = Config.from_file(os.path.join(model_dir, ModelFile.CONFIGURATION))\n",
    "cfg_dict.model.head.num_classes = num_classes\n",
    "cfg_dict.CLASSES = CLASSES\n",
    "model = Model.from_pretrained(model_id, task=task, cfg_dict=cfg_dict, revision=revision)\n",
    "\n",
    "restuning_config = ResTuningConfig(\n",
    "    dims=768,\n",
    "    root_modules=r'.*backbone.blocks.0$',\n",
    "    stem_modules=r'.*backbone.blocks\\.\\d+$',\n",
    "    target_modules=r'.*backbone.norm',\n",
    "    target_modules_hook='input',\n",
    "    tuner_cfg='res_adapter',\n",
    ")\n",
    "\n",
    "model = Swift.prepare_model(model, config=restuning_config)\n",
    "\n",
    "print(model.get_trainable_parameters())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bc0ac52-9d92-4eb2-b299-0896c33d78ee",
   "metadata": {},
   "source": [
    "#### 1.5.2 模型训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09ccc2f4-059b-4502-8d97-9efb169d121f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def cfg_modify_fn(cfg):\n",
    "    cfg.model.head.num_classes = num_classes\n",
    "    cfg.model.finetune = True\n",
    "    cfg.CLASSES = CLASSES\n",
    "    cfg.train.max_epochs = 5\n",
    "    cfg.train.lr_scheduler.T_max = 10\n",
    "    return cfg\n",
    "\n",
    "work_dir = \"tmp/cv_swift_restuning\"\n",
    "kwargs = dict(\n",
    "    model=model,\n",
    "    cfg_file=os.path.join(model_dir, 'configuration.json'),\n",
    "    work_dir=work_dir,\n",
    "    train_dataset=train_dataset,\n",
    "    eval_dataset=eval_dataset,\n",
    "    cfg_modify_fn=cfg_modify_fn,\n",
    ")\n",
    "\n",
    "trainer = build_trainer(name=Trainers.vision_efficient_tuning, default_args=kwargs)\n",
    "trainer.train()\n",
    "result = trainer.evaluate()\n",
    "print(f'Vision-efficient-tuning-restuning train output: {result}.')\n",
    "print(os.system(\"nvidia-smi\"))\n",
    "torch.cuda.empty_cache()\n",
    "del trainer\n",
    "del model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ec9fda9-89bc-4dd7-8b21-f4a66cadf552",
   "metadata": {},
   "source": [
    "### 1.6 更多基础模型及工具包使用样例"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c24c2d64-f050-45ff-b34f-e07c8eeb1dde",
   "metadata": {},
   "source": [
    "#### 1.6.1 Transformers\n",
    "\n",
    "安装依赖包：pip install transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e36bf7a9-ef4d-4788-91bc-682d665ecff1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建模型\n",
    "from transformers import AutoModelForImageClassification\n",
    "\n",
    "model_dir = snapshot_download(\"AI-ModelScope/vit-base-patch16-224\")\n",
    "model = AutoModelForImageClassification.from_pretrained(model_dir)\n",
    "module_keys = [key for key, _ in model.named_modules()]\n",
    "print(module_keys)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f074f2e0-3ba9-4f2c-ae22-75f92adde979",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建微调模型\n",
    "prompt_config = PromptConfig(\n",
    "    dim=768,\n",
    "    target_modules=r'.*layer\\.\\d+$', \n",
    "    embedding_pos=0, \n",
    "    prompt_length=10, \n",
    "    attach_front=False \n",
    ")\n",
    "\n",
    "adapter_config = AdapterConfig(\n",
    "    dim=768, \n",
    "    hidden_pos=0,  \n",
    "    target_modules=r'.*attention.output.dense$',  \n",
    "    adapter_length=10 \n",
    ")\n",
    "\n",
    "model = Swift.prepare_model(model, {\"adapter_tuner\": adapter_config, \"prompt_tuner\": prompt_config})\n",
    "model.get_trainable_parameters()\n",
    "print(model(torch.ones(1, 3, 224, 224)).logits.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69a8e3cc-95ca-4576-9a57-cd06f6cac278",
   "metadata": {},
   "source": [
    "#### 1.6.2 TIMM\n",
    "\n",
    "安装依赖包：pip install timm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c533bd0-da55-4044-9fcc-2f39dad3b456",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建模型\n",
    "import timm\n",
    "\n",
    "model = timm.create_model(\"vit_base_patch16_224\", pretrained=False, num_classes=100)\n",
    "module_keys = [key for key, _ in model.named_modules()]\n",
    "print(module_keys)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a5eb5d3-d29c-41bb-b6c4-499d795424a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建模型\n",
    "restuning_config = ResTuningConfig(\n",
    "            dims=768,\n",
    "            root_modules=r'.*blocks.0$',\n",
    "            stem_modules=r'.*blocks\\.\\d+$',\n",
    "            target_modules=r'norm',\n",
    "            tuner_cfg='res_adapter'\n",
    ")\n",
    "\n",
    "model = Swift.prepare_model(model, restuning_config)\n",
    "model.get_trainable_parameters()\n",
    "print(model(torch.ones(1, 3, 224, 224)).shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "416ed083-2d2b-4b63-aa17-0316fe6cd244",
   "metadata": {},
   "source": [
    "### 1.7 更多任务"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fbd8b13-2f2f-435f-94b0-8c06e1c6b618",
   "metadata": {},
   "source": [
    "SWIFT提供的是对模型层面进行微调的能力，故当不同的任务采用相似的基础模型架构时，即可泛化到不同的任务中，如检测、分割等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2dd914a-750c-4723-b334-c1824a2e63b5",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "aigc_env",
   "language": "python",
   "name": "aigc_env"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
